AITopics | dataflow architecture

Collaborating Authors

dataflow architecture

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learned Cost Model for Placement on Reconfigurable Dataflow Hardware

Guha, Etash, Jiang, Tianxiao, Deng, Andrew, Zhang, Jian, Annamalai, Muthu

arXiv.org Artificial IntelligenceNov-5-2025

Mapping a dataflow - graph of an ML model onto a reconfigurable system is difficult, as different mappings have different throughputs and consume resource constraints differently. To solve this, a model to evaluate the throughput of mappings is necessary as measuring throughput completely is expensive. Many use a hand - designed analytical model, relying on proxy features or intuition, introducing error. We provide a Learned Approach that predicts throughput 31% - 52% more accurately over a variety of graphs. In addition, our approach shows no accuracy degradation after removing performance annotations. We show that using this approach results in 5.6% faster compiled graphs.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.01872

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Executable Ontologies: Synthesizing Event Semantics with Dataflow Architecture

Boldachev, Aleksandr

arXiv.org Artificial IntelligenceSep-17-2025

This paper presents boldsea, Boldachev's semantic-event approach -- an architecture for modeling complex dynamic systems using executable ontologies -- semantic models that act as dynamic structures, directly controlling process execution. We demonstrate that integrating event semantics with a dataflow architecture addresses the limitations of traditional Business Process Management (BPM) systems and object-oriented semantic technologies. The paper presents the formal BSL (boldsea Semantic Language), including its BNF grammar, and outlines the boldsea-engine's architecture, which directly interprets semantic models as executable algorithms without compilation. It enables the modification of event models at runtime, ensures temporal transparency, and seamlessly merges data and business logic within a unified semantic framework.

artificial intelligence, natural language, text processing, (19 more...)

arXiv.org Artificial Intelligence

2509.09775

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Banking & Finance (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)

Add feedback

LUTMUL: Exceed Conventional FPGA Roofline Limit by LUT-based Efficient Multiplication for Neural Network Inference

Xie, Yanyue, Li, Zhengang, Diaconu, Dana, Handagala, Suranga, Leeser, Miriam, Lin, Xue

arXiv.org Artificial IntelligenceOct-31-2024

For FPGA-based neural network accelerators, digital signal processing (DSP) blocks have traditionally been the cornerstone for handling multiplications. This paper introduces LUTMUL, which harnesses the potential of look-up tables (LUTs) for performing multiplications. The availability of LUTs typically outnumbers that of DSPs by a factor of 100, offering a significant computational advantage. By exploiting this advantage of LUTs, our method demonstrates a potential boost in the performance of FPGA-based neural network accelerators with a reconfigurable dataflow architecture. Our approach challenges the conventional peak performance on DSP-based accelerators and sets a new benchmark for efficient neural network inference on FPGAs. Experimental results demonstrate that our design achieves the best inference speed among all FPGA-based accelerators, achieving a throughput of 1627 images per second and maintaining a top-1 accuracy of 70.95% on the ImageNet dataset.

artificial intelligence, fpga, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3658617.3697687

2411.11852

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.16)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

HASS: Hardware-Aware Sparsity Search for Dataflow DNN Accelerator

Yu, Zhewen, Sreeram, Sudarshan, Agrawal, Krish, Wu, Junyi, Montgomerie-Corcoran, Alexander, Zhang, Cheng, Cheng, Jianyi, Bouganis, Christos-Savvas, Zhao, Yiren

arXiv.org Artificial IntelligenceJun-5-2024

Deep Neural Networks (DNNs) excel in learning hierarchical representations from raw data, such as images, audio, and text. To compute these DNN models with high performance and energy efficiency, these models are usually deployed onto customized hardware accelerators. Among various accelerator designs, dataflow architecture has shown promising performance due to its layer-pipelined structure and its scalability in data parallelism. Exploiting weights and activations sparsity can further enhance memory storage and computation efficiency. However, existing approaches focus on exploiting sparsity in non-dataflow accelerators, which cannot be applied onto dataflow accelerators because of the large hardware design space introduced. As such, this could miss opportunities to find an optimal combination of sparsity features and hardware designs. In this paper, we propose a novel approach to exploit unstructured weights and activations sparsity for dataflow accelerators, using software and hardware co-optimization. We propose a Hardware-Aware Sparsity Search (HASS) to systematically determine an efficient sparsity solution for dataflow accelerators. Over a set of models, we achieve an efficiency improvement ranging from 1.3$\times$ to 4.2$\times$ compared to existing sparse designs, which are either non-dataflow or non-hardware-aware. Particularly, the throughput of MobileNetV3 can be optimized to 4895 images per second. HASS is open-source: \url{https://github.com/Yu-Zhewen/HASS}

accelerator, computation, sparsity, (17 more...)

arXiv.org Artificial Intelligence

2406.03088

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

INR-Arch: A Dataflow Architecture and Compiler for Arbitrary-Order Gradient Computations in Implicit Neural Representation Processing

Abi-Karam, Stefan, Sarkar, Rishov, Xu, Dejia, Fan, Zhiwen, Wang, Zhangyang, Hao, Cong

arXiv.org Artificial IntelligenceAug-11-2023

An increasing number of researchers are finding use for nth-order gradient computations for a wide variety of applications, including graphics, meta-learning (MAML), scientific computing, and most recently, implicit neural representations (INRs). Recent work shows that the gradient of an INR can be used to edit the data it represents directly without needing to convert it back to a discrete representation. However, given a function represented as a computation graph, traditional architectures face challenges in efficiently computing its nth-order gradient due to the higher demand for computing power and higher complexity in data movement. This makes it a promising target for FPGA acceleration. In this work, we introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture. We address this problem in two phases. First, we design a dataflow architecture that uses FIFO streams and an optimized computation kernel library, ensuring high memory efficiency and parallel computation. Second, we propose a compiler that extracts and optimizes computation graphs, automatically configures hardware parameters such as latency and stream depths to optimize throughput, while ensuring deadlock-free operation, and outputs High-Level Synthesis (HLS) code for FPGA implementation. We utilize INR editing as our benchmark, presenting results that demonstrate 1.8-4.8x and 1.5-3.6x speedup compared to CPU and GPU baselines respectively. Furthermore, we obtain 3.1-8.9x and 1.7-4.3x lower memory usage, and 1.7-11.3x and 5.5-32.8x lower energy-delay product. Our framework will be made open-source and available on GitHub.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2308.0593

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Europe > Finland > Paijanne Tavastia > Lahti (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Mixed-TD: Efficient Neural Network Accelerator with Layer-Specific Tensor Decomposition

Yu, Zhewen, Bouganis, Christos-Savvas

arXiv.org Artificial IntelligenceJun-22-2023

Neural Network designs are quite diverse, from VGG-style to ResNet-style, and from Convolutional Neural Networks to Transformers. Towards the design of efficient accelerators, many works have adopted a dataflow-based, inter-layer pipelined architecture, with a customised hardware towards each layer, achieving ultra high throughput and low latency. The deployment of neural networks to such dataflow architecture accelerators is usually hindered by the available on-chip memory as it is desirable to preload the weights of neural networks on-chip to maximise the system performance. To address this, networks are usually compressed before the deployment through methods such as pruning, quantization and tensor decomposition. In this paper, a framework for mapping CNNs onto FPGAs based on a novel tensor decomposition method called Mixed-TD is proposed. The proposed method applies layer-specific Singular Value Decomposition (SVD) and Canonical Polyadic Decomposition (CPD) in a mixed manner, achieving 1.73x to 10.29x throughput per DSP to state-of-the-art CNNs. Our work is open-sourced: https://github.com/Yu-Zhewen/Mixed-TD

artificial intelligence, machine learning, throughput, (19 more...)

arXiv.org Artificial Intelligence

2306.05021

Country:

Europe > United Kingdom > England > Greater London > London (0.04)
Africa > Senegal > Kolda Region > Kolda (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Dataflow Architectures: Flexible Platforms for Neural Network Simulation

Neural Information Processing SystemsApr-6-2023, 19:49:09 GMT

Dataflow architectures are general computation engines optimized for the execution of fme-grain parallel algorithms. Neural networks can be simulated on these systems with certain advantages. In this paper, we review dataflow architectures, examine neural network simulation performance on a new generation dataflow machine, compare that performance to other simulation alternatives, and discuss the benefits and drawbacks of the dataflow approach. Dataflow architectures are general computation engines that treat each instruction of a program as a separate task which is scheduled in an asynchronous, data-driven fashion. Dataflow programs are compiled into graphs which explicitly describe the data dependencies of the computation.

dataflow architecture, flexible platform, neural network simulation, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
Information Technology > Communications > Networks (0.68)

Add feedback

FlowGNN: A Dataflow Architecture for Real-Time Workload-Agnostic Graph Neural Network Inference

Sarkar, Rishov, Abi-Karam, Stefan, He, Yuqi, Sathidevi, Lakshmi, Hao, Cong

arXiv.org Artificial IntelligenceOct-19-2022

Graph neural networks (GNNs) have recently exploded in popularity thanks to their broad applicability to graph-related problems such as quantum chemistry, drug discovery, and high energy physics. However, meeting demand for novel GNN models and fast inference simultaneously is challenging due to the gap between developing efficient accelerators and the rapid creation of new GNN models. Prior art focuses on accelerating specific classes of GNNs, such as Graph Convolutional Networks (GCN), but lacks generality to support a wide range of existing or new GNN models. Furthermore, most works rely on graph pre-processing to exploit data locality, making them unsuitable for real-time applications. To address these limitations, in this work, we propose a generic dataflow architecture for GNN acceleration, named FlowGNN, which is generalizable to the majority of message-passing GNNs. The contributions are three-fold. First, we propose a novel and scalable dataflow architecture, which generally supports a wide range of GNN models with message-passing mechanism. The architecture features a configurable dataflow optimized for simultaneous computation of node embedding, edge embedding, and message passing, which is generally applicable to all models. We also propose a rich library of model-specific components. Second, we deliver ultra-fast real-time GNN inference without any graph pre-processing, making it agnostic to dynamically changing graph structures. Third, we verify our architecture on the Xilinx Alveo U50 FPGA board and measure the on-board end-to-end performance. We achieve a speed-up of up to 24-254x against CPU (6226R) and 1.3-477x against GPU (A6000) (with batch sizes 1 through 1024); we also outperform the SOTA GNN accelerator I-GCN by 1.26x speedup and 1.55x energy efficiency over four datasets. Our implementation code and on-board measurement are publicly available on GitHub.

artificial intelligence, machine learning, node, (18 more...)

arXiv.org Artificial Intelligence

2204.13103

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Monterey County > Seaside (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Architecture (1.00)

Add feedback

The Promise of Dataflow Architectures in the Design of Processing Systems for Autonomous Machines

Liu, Shaoshan, Zhu, Yuhao, Yu, Bo, Gaudiot, Jean-Luc, Gao, Guang R.

arXiv.org Artificial IntelligenceSep-14-2021

The commercialization of autonomous machines is a thriving sector, and likely to be the next major computing demand driver, after PC, cloud computing, and mobile computing. Nevertheless, a suitable computer architecture for autonomous machines is missing, and many companies are forced to develop ad hoc computing solutions that are neither scalable nor extensible. In this article, we analyze the demands of autonomous machine computing, and argue for the promise of dataflow architectures in autonomous machines. The commercialization of autonomous machines is a thriving sector, with projected average compound annual growth rate (CAGR) of 26%, and by 2030 this sector will have a market size of $1 trillion [1]. Hence, this sector is likely to be the next major computing demand driver, after personal computers, cloud computing, and mobile computing. Autonomous machines exist in multiple forms, e.g., cars, aerial drones, service robots, industrial robots.

architecture, autonomous machine, dataflow architecture, (13 more...)

arXiv.org Artificial Intelligence

2109.07047

Country: Asia > Japan > Honshū > Kansai > Hyogo Prefecture > Kobe (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.47)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.54)

Add feedback

The Next Wave of Deep Learning Architectures

#artificialintelligenceJun-2-2017, 15:30:18 GMT

Intel has planted some solid stakes in the ground for the future of deep learning over the last month with its acquisition of deep learning chip startup, Nervana Systems, and most recently, mobile and embedded machine learning company, Movidius. These new pieces will snap into Intel's still-forming puzzle for capturing the supposed billion-plus dollar market ahead for deep learning, which is complemented by its own Knights Mill effort and software optimization work on machine learning codes and tooling. At the same time, just down the coast, Nvidia is firming up the market for its own GPU training and inference chips as well as its own hardware outfitted with the latest Pascal GPUs and requisite deep learning libraries. While Intel's efforts have garnered significant headlines recently with that surprising pair of acquisitions, a move which is pushing Nvidia harder to demonstrate GPU acceleration (thus far the dominant compute engine for model training) for deep learning, they still have some work to do to capture mindshare for this emerging market. Further complicating this is the fact that the last two years have brought a number of newcomers to the field--deep learning chip upstarts touting the idea that general purpose architectures (including GPUs) cannot compare to a low precision, fixed point, specialized approach.

artificial intelligence, deep learning, machine learning, (13 more...)

#artificialintelligence

Industry: Information Technology (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback